Week 07: Code Generation & Operational Semantics

Code Generation

Introduction

Goal

Generating code for a stack machine with the accumulator. Run the code on a real machine, e.g., the MIPS processor.

Concept

The accumulator is kept in MIPS register $a0.
Keeps the stack in memory.
The address of the next location on the stack is kept in MIPS register $sp
- The top of the stack is at address $sp + 4

MIPS

We will compile the source code from Cool to MIPS assembly to execute it.

MIPS

RISC
Operation uses registers for operands and results.
Use load and store instructions to manipulate the value in memory.
32 general purpose registers x 32bits per register

MIPS Instruction

lw reg_1 offset(reg_2): Load 32-bit word in address reg_2 + offset into register reg_1
sw reg_1 offset(reg_2): Store 32-bit word in register reg_1 into address reg_2 + offset
add reg_tar reg_src reg_src2
addiu reg_tar reg_src imm: reg_tar = reg_src + imm (u means overflow is not checked.)
li reg imm: reg = imm
beq reg1 reg2 label: If reg1 = reg2, then jump to label.
b label: Jump to label.

Sample for 7+5

In compiler

acc <- 7
push acc

acc <- 5
acc <- acc + top_of_stack

pop

Code gen

li $a0 7
sw $a0 0($sp)
addiu $sp $sp -4
li $a0 5
lw $t1 4($sp)
add $a0 $a0 $t1
addiu $sp $sp 4

Code Generation

How it works

For expression e, generate MIPS code

Compute the value of e in $a0
Preserve $sp and the contents of the stack

Define a code generation function cgen(e) whose result is the code generated for e.

The code in source

cgen(e1 + e2) =
cgen(e1)
print "sw $a0 0($sp)"
print "addiu $sp $sp -4"
cgen(e2)
print "lw $t1 4($sp)"
print "add $a0 $t1 $a0"
print "addiu $sp $sp 4"

Remarks

Before entering the another cgen function, you should not save anything in the temporal register. Because the temporal register may be used after. Just use the stack machine.
Stack machine code generation is recursive.
Code generation can be written as a recursive-descent of the AST.
In different language, code for function calls and function definitions depends on the layout of the AR.

Simple AR for Cool

The result is always in the accumulator.
The activation record holds actual parameters.
The stack discipline guarantees that on function exit $sp is the same as it was on function entry.

On function exit

We need the return address to jump back.
A pointer to the current activation is useful.
This pointer lives in register $fp (frame pointer)
When entering a call to f(x,y) the AR of f will be pushed into the stack:
- old fp
- y
- x
- Return Address

How do we manage the variable's location

Problem: Because the stack grows when intermediate results are saved, the variables are not at a fixed offset from $sp
Solution
- Tutor: use a frame pointer
- I use another stack in code gen to save and search the variables in the stack.
  - load from $sp + (Stack_Size - Variable_Index)
The activation record must be designed together with the code generator
Code generation can be done by recursive traversal of the AST

Temporaries

How do we estimate the stack's size?

Basic

Let NT(e) = # of temps needed to evaluate the expression e

NT(e1 + e2)
- Needs at least as many temporaries as NT(e1)
- Needs at least as many temporaries as NT(e2) + 1
- Space used for temporaries in e1 can be reused for temporaries in e2
- NT(e1 + e2) = max(NT(e1), 1 + NT(e2))
NT(e1 - e2) = max(NT(e1), 1 + NT(e2))
NT(if e1 = e2 then e3 else e4) = max(NT(e1),1 + NT(e2), NT(e3), NT(e4))
NT(id(e1,...,en) = max(NT(e1),...,NT(en))
NT(int) = 0
NT(id) = 0

Function

For a function definition f(x1,...,xn) = e the AR has 2 + n + NT(e) elements

Return Address $ra
Frame Pointer $fp
n arguments
NT(e) locations for intermediate results

Tempories in AR

old fp
xn
...
x1
RA
Temp NT(e)
...
Temp 1

Remarks

Code generation must know how many temporaries are in use at each point
Add a new argument to code generation
the position of the next available temporary
The temporary area is used like a small, fixed-size stack

Object Layout

Basic

OO implementation = Basic code generation + More stuff
If B inherits A, code in class A works unmodified for an object of class B
Each attribute stored at a fixed offset in the object
The attribute is in the same place in every object of that class and inherited classes.

Object Layout

Offset +0: Class Tag (Int)
Offset +4: Object Size (Int)
Offset +8: Pointer to Class Method Table (Address)
Offset +12, 16, ...: Pointer to Attribute (Address)
- If C inherits B and B inherits A, then the sequence is A's attributes, B's attributes, and C's attributes.

Class Method Table Layout

If C inherits B and B inherits A
A contains f1, f2, f3
B contains f2, f3, f4, f5
C contains f3, f5, f6
Class C Method Table Layout
- A's method f1: A.f1
- A's method f2 overrided by B: B.f2
- A's method f3 overrided by C: C.f3
- B's method f4: B.f4
- B's method f5 overrided by C: C.f3
- C's method f6: C.f6

Cgen Semantics

Semantics Overview

Rules in every step

The tokens $\Rightarrow$ lexical analysis
The grammar $\Rightarrow$ syntactic analysis
The typing rules $\Rightarrow$ semantic analysis
The evaluation rules$\Rightarrow$ code generation and optimization
- The compilation of Cool to a stack machine.

Consideration in Assembly-language description

Whether to use a stack machine or not
Which way the stack grows
How integers are represented
The particular instruction set of the architecture

How to specify evaluation rules?

Many ways to specify semantics
Operational Semantics (What we use in Cool)
- Describes program evaluation via execution rules
  - on an abstract machine
- Most useful for specifying implementations
Denotational Semantics
- Program’s meaning is a mathematical function
Axiomatic Semantics
- Program behavior described via logical formulae
  - If execution begins in state satisfying X, then it ends in state satisfying Y
  - X, Y formulas
- Foundation of many program verification systems

Operational Semantics

TL; DR

Similar to typing rules. Read Coool Manual Ch.13 Operational semantics.